Voting experts: An unsupervised algorithm for segmenting sequences

نویسندگان

  • Paul R. Cohen
  • Niall M. Adams
  • Brent Heeringa
چکیده

We describe a statistical signature of chunks and an algorithm for finding chunks. While there is no formal definition of chunks, they may be reliably identified as configurations with low internal entropy or unpredictability and high entropy at their boundaries. We show that the log frequency of a chunk is a measure of its internal entropy. The Voting-Experts exploits the signature of chunks to find word boundaries in text from four languages and episode boundaries in the activities of a mobile robot.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Voting Experts: An Unsupervised Algorithm for Segmenting Hierarchically Structured Sequences

This paper extends the Voting Experts (VE) algorithm (Cohen, Adams, & Heeringa 2007) to segment hierarchically structured sequences. The original algorithm was tested on text segmentation, and made use of two proposed characteristics of chunks, namely low internal entropy and high boundary entropy of segments. VE looks for these two properties, and uses them to segment sequences of tokens. It i...

متن کامل

An Unsupervised Algorithm for Segmenting Categorical Timeseries into Episodes

This paper describes an unsupervised algorithm for segmenting categorical time series into episodes. The VOTINGEXPERTS algorithm first collects statistics about the frequency and boundary entropy of ngrams, then passes a window over the series and has two “expert methods” decide where in the window boundaries should be drawn. The algorithm successfully segments text into words in four languages...

متن کامل

Bootstrap Voting Experts

BOOTSTRAP VOTING EXPERTS (BVE) is an extension to the VOTING EXPERTS algorithm for unsupervised chunking of sequences. BVE generates a series of segmentations, each of which incorporates knowledge gained from the previous segmentation. We show that this method of bootstrapping improves the performance of VOTING EXPERTS in a variety of unsupervised word segmentation scenarios, and generally impr...

متن کامل

Layered Mereotopology

BOOTSTRAP VOTING EXPERTS (BVE) is an extension to the VOTING EXPERTS algorithm for unsupervised chunking of sequences. BVE generates a series of segmentations, each of which incorporates knowledge gained from the previous segmentation. We show that this method of bootstrapping improves the performance of VOTING EXPERTS in a variety of unsupervised word segmentation scenarios, and generally impr...

متن کامل

An Unsupervised Algorithm for Finding Episode Boundaries

This paper describes an unsupervised algorithm for segmenting categorical time series into episodes. The VotingExperts algorithm rst collects statistics about the frequency and boundary entropy of ngrams, then passes a window over the series and has two \expert methods" decide where in the window boundaries should be drawn. The algorithm segments text into words successfully in four languages. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Intell. Data Anal.

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2007